Benchmarking Parallel Performance on Many-Core Processors
نویسندگان
چکیده
With the emergence of many-core processor architectures onto the HPC scene, concerns arise regarding the performance and productivity of numerous existing parallel-programming tools, models, and languages. As these devices begin augmenting conventional distributed cluster systems in an evolving age of heterogeneous supercomputing, proper evaluation and profiling of many-core processors must occur in order to understand their performance and architectural strengths with existing parallel-programming environments and HPC applications. This paper presents and evaluates the comparative performance between two many-core processors, the Tilera TILE-Gx8036 and the Intel Xeon Phi 5110P, in the context of their applications performance with the SHMEM and OpenMP parallel-programming environments. Several applications written or provided in SHMEM and OpenMP are evaluated in order to analyze the scalability of existing tools and libraries on these many-core platforms. Our results show that SHMEM and OpenMP parallel applications scale well on the TILE-Gx and Xeon Phi, but heavily depend on optimized libraries and instrumentation.
منابع مشابه
Ultra-Low-Energy DSP Processor Design for Many-Core Parallel Applications
Background and Objectives: Digital signal processors are widely used in energy constrained applications in which battery lifetime is a critical concern. Accordingly, designing ultra-low-energy processors is a major concern. In this work and in the first step, we propose a sub-threshold DSP processor. Methods: As our baseline architecture, we use a modified version of an existing ultra-low-power...
متن کاملDesign, Evaluation, and Implementation of an Accurate Benchmarking Methodology for Multi- Core Processors Based Systems
The emergence of multi-core architectures has lead to the challenge of its efficient utilization. A comprehensive understanding of performance issues for these architectures is required to materialize any performance gains. This research conducts measurement based performance benchmarking and evaluation for multi-core processors based systems. It comprises of four phases: (1) design of an accur...
متن کاملBenchmarking and Optimising Software for Improved Multiprocessor Performance
Parallel computers have become widely available with the introduction of multicore processors, however software will need to be optimised to make use of the additional processors. A number of tools can aid the process of optimising code for parallel execution, such as parallel debuggers, profilers and parallel libraries. This paper describes an approach to evaluating the effectiveness of a sele...
متن کاملEfficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems
Efficient parallelization of genetic algorithms (GAs) on state-of-the-art multi-threading or many-threading platforms is a challenge due to the difficulty of schedulation of hardware resources regarding the concurrency of threads. In this paper, for resolving the problem, a novel method is proposed, which parallelizes the GA by designing three concurrent kernels, each of which running some depe...
متن کاملGPU-STREAM v2.0: Benchmarking the Achievable Memory Bandwidth of Many-Core Processors Across Diverse Parallel Programming Models
Many scientific codes consist of memory bandwidth bound kernels — the dominating factor of the runtime is the speed at which data can be loaded from memory into the Arithmetic Logic Units, before results are written back to memory. One major advantage of many-core devices such as General Purpose Graphics Processing Units (GPGPUs) and the Intel Xeon Phi is their focus on providing increased memo...
متن کامل